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Abstract 



A generic model of a random polypeptide chain, with discrete torsional 
degrees of freedom and Hookean springs connecting pairs of hydrophobic 
residues, reproduces the energy probability distribution of real proteins over 
a very large range of energies. We show that this system with harmonic in- 
teractions, under dissipative dynamics driven by random noise, leads to a 
distribution of energy states obeying a modified one-dimensional Ornstein- 
Uhlenbeck process and giving rise to the so called Wigner distribution. A 
tunably fine- or coarse-grained sampling of the energy landscape yields a 
family of distributions for the energies and energy spacings. 
PACS 5.65+b,5.70Ln,87.17.Aa 
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I. INTRODUCTION 



We present a generic model hamiltonian for polypeptide chains which we believe captures 
the essential mechanism driving the folding process, namely hydrophobic interactions |l]-[7] , 
and is able to reproduce the distribution of energy states of real proteins 0. 

We take the view here that the protein in its native state must essentially correspond 
to a self-organized system, i.e., the "native state" should be concieved of as the attractor of 
a dynamics. This typically corresponds not to a unique conformation but to a set of con- 
formations to which the trajectory of the phase point representing the molecule is confined 
after asymptotically long times. 

Our model involves N coupled, discrete, over-damped torsional degrees of freedom cou- 
pled by Hookean forces and driven by random noise. As a numerical realization of this 
dissipative system, we explore the phase space under a dynamics based on relaxing pairs of 
rotational degrees of freedom, namely the dihedral angles, sampled with a probability which 
is a function of the conjugate torques, 
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The energy landscape is effectively coarse- or fine-grained by tuning the parameter 77. For 
7] = 0, the dynamics is identical to very high temperature Monte Carlo simulations. 

We find that the energy probability distribution obtained from our simulations may be 



very well represented by a Wigner distribution p|-|i0f which, for a random quantity S, is 
given by 

P(S) ~ S exp(-^ 2 ) (2) 

while the coarse grained energy level distributions are comparable with the n'th (n = 
1,2,3...) neighbor energy level spacing statistics encountered Jill i n the study of large 
nuclei. The energy histograms can also be very well fitted with an "inverse Gaussian" (IG) 
distribution. 

We are able to show that for harmonic potentials, quite independently of the nature of the 
sequence of hydrophobic and polar residues, or the dimensionality of the space, the energy 
of the system obeys a modified Ornstein-Uhlenbeck (OU) process ||12|| . The stationary 
state distribution for this process with reflecting boundary conditions introduced due to 
constraints, may then be related to the energy distribution. As a bonus, we are also able to 



understand the distribution of relaxation times found for global optimization problems [13 
by Li and coworkers. 

We have already reported in a separate publication J7| that under Metropolis Monte Carlo 
dynamics, with random initial conditions, the model exhibits power law relaxation for the 
initial stages of decay, and at the later stages the relaxation obeys a stretched exponential 
with the exponent f3 ~ 1/4. This Kohlrausch- Williams- Watts type relaxation behaviour 
is observed experimentally for real proteins [P^I7|. At zero temperature the probability 
distribution function of the energy steps encountered along a relaxation path in phase space 
also obeys a stretched exponential form, with another exponent a ~ 0.39. In J7| we show 
that (3 = a /(a + 1), which yields a value for (3 in very good agreement with our simulation 
results. 



2 



The paper is organized as follows. In section 2 we define our model, in section 3 we 
present our simulation results, in section 4 we show that the energy obeys an OU rocess. 
In section 5 we discuss the relationship between this model and other complex systems and 
outline work in progress. 

II. THE MODEL 

We consider a model |7j consisting of N residues, treated as point vertices, interacting via 
Hookean potentials. We have been motivated by the model proposed by Haliloglu, Bahar, 
Erman M where all interactions between the different residues are governed by confining 
square- law potentials ||-||. In our model, however, the covalent bonds between residues are 
treated as fixed rods of equal length (see Fig.l). The residues located at the vertices may be 
polar P or hydrophobic H. All the hydrophobic vertices are to be connected to each other 
with springs of equal stiffness. This feature mimicks the effective pressure that is exerted on 
the hydrophobic residues by the ambient water molecules, and results in their being driven to 
the relatively less exposed center of the molecule in the low lying energy states, whereas the 
polar residues are closer to the surface. It is important to note that we treat all H — H pairs 
on an equal footing, i.e., there is no "teleological" information that is fed into the system by 
connecting only those H — H pairs which are close to each other in the native configuration 
for a particular sequence. It is known that real proteins are distinguished by H — P sequences 
that lead to unique ground states while a randomly chosen H — P sequence will typically 
give rise to a highly degenerate ground state. In the absence of detailed knowledge regarding 
the rules singling out the realistic H — P sequences we considered a generic H — P sequence 
obtained by choosing fifty percent of the residues to be hydrophobic and distributing them 
randomly along the chain. We have checked that our results were quite robust with respect 
to changing the sequence of hydrophobic or hydrophilic residues, or even taking all of them 
to be hydrophobic. 

The energy of the molecule is 

E = § £ cy|r, - r,| 2 = KJ2 ife (3) 

If we define Qi = 1 for the V th vertex being occupied by a hydrophobic residue, and Qi = 
otherwise, we may write cy = QiQj and 

Vij = [(N H ~ l)c»,i - Cjj-l - Cjj+lRj 

- 0- - 8ij)0- - - 8ij+i)cij ■ ( 4 ) 

We take the bond angles aj, i = 1 . . . , N — 1, to have the alternating values of (— l) l a, 
with a = 68°. The dihedral angles <pi can take on the values of and ±27r/3. The state 
(conformation) of the system is uniquely specified once the numbers {0j} are given. The 
constraints placed on the conformations due to the rigid chemical bond lengths and by 
restricting the chemical and dihedral angles to discrete values prevent the molecule from 
trivially collapsing to a point. The residues effectively reside on the vertices of a tetrahedral 
lattice. The position vectors of each of the vertices in the chain can be expressed in terms 
of a sum over the directors R; of unit length representing the chemical bonds, which may 
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be obtained from Ri by successive rotations Mfc(afc) and Tk(4>k) through the bond and the 
dihedral angles [18], viz., 



i-1 2 

ri = EII T *(^) M *( a *) R i • ( 5 ) 

j=i k=j 

where we may choose Ri to lie along any of the Cartesian directions in our laboratory 
frame without loss of generality. We obtain the torques that act at each of the vertices i by 
substituting this in equation @ and taking the partial derivative with respect to fa, viz., 

n = -dE/dfa . (6) 

The system is assumed to evolve within a viscous environment, subject to random kicks 
from the ambient molecules. We may write the Langevin equation, 

(7) 

where ( r is a friction coefficient and £ r (i,t) is a Gaussian distributed noise term, delta 
correlated in i and in time. Equivalently, in terms of the state vector </> = (fa, . . . , (/>n), we 
have the Langevin equation 

where the torque Tj is a function of all the angles {(f)}, ( r is the appropriate friction coefficient 
and £ T is again a Gaussian random force delta correlated in space and time. Viewed in this 



way the dynamics is similar to a pinned interface or a charge density wave system p9| -|2T 
in 1 + 1 dimensions. 

For the discrete, sequential numerical simulation of the evolution of this system, we 
postulate the following set of rules: 



1. Form the self-similar probability distribution in equation ([!]), P(i) = \Ti\ v /YU l r « 



2. Choose a pair of vertices (i,i') according to this probability distribution over {t{ > 0} 
and {rj < 0}, 

3. Set fa(t + 1) = fa(t) + sign(r l )(2vr/3) . 

Here rj is a tunable parameter defining the dynamics. For large positive values of rj, those 
angles fa with the maximal conjugate torques are incremented; for negative values of 77 the 
small values of the torque are preferred. For rj = the angles to be incremented are picked 
randomly. If one choses rj to be very large, then we find that there is a large probability that 
the most recently updated fa still carries a very large torque, resulting in a jamming of the 
dynamics. Incrementing the dihedral angles with the large conjugate torques resulted not 
in the relaxation of these torques but in pumping energy into the system, as when pushing 
a swing at the top of its arc. After applying the search strategy based on changing the 
torques according to a distribution, we found that updating the maximal torques (77 > 0) 
drives the system to a state with relatively high energies, whereas a random search (77 = 0) or 
preferentially choosing the minimal torques (77 < 0) gives rise to more successful strategies 
for reaching low lying energy states. Thus it can be said that 77 here plays the role of a 
coarse- or fine-graining parameter in the exploration of the energy landscape. 
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III. DISTRIBUTION OF ENERGY STATES AND LEVEL SPACINGS 



The distribution of the energies of the discrete configurational states explored by the 
chain of iV = 48 residues shown in Fig.l, as it evolves under the above dynamics, is shown 
in Figs. 2-5, for both positive and negative rj. After the first 5000 steps were discarded, the 
statistics were taken over 5000 steps of the trajectory. It can be seen that the shape of the 
curve does not essentially change, while for positive rj the peak shifts to successively higher 
values of the energy, and the distribution is distorted towards a Gaussian, indicating that the 
states explored are less correlated. These figures should be compared with those reported 
by ben-Avraham [IjJ for the density of vibrational states and by Mach et al. |22|] for the 



ultraviolet absorption spectra, and also with the energy histograms obtained by Socci and 
Onuchic [2~3| for a Monte Carlo simulation on a lattice model of proteinlike heteropolymer. 



Our computation seems to be very successful in producing realistic distributions of energy 
states over the whole range of relevant energies. 

We have been able to fit the simulation results very successfully with a distribution of 
the Wigner form (Figs. 2,3) 

f w (E) = a{E - E )e^ E ' E ^ , (9) 

for r) = —6 to 7] = 3, and the parameters for the fit are given in Table la. Here E corresponds 
to the offset due to the lowest energy state attained for the different rj, and it can be seen 
that it shifts the distribution to higher values of the energy for higher values of rj. The 
distributions become Gaussian for r\ = 6 and 77 = 8; the results of the Gaussian fits are 
presented in Table lb. 

It should be mentioned that the same energy distributions may be fitted equally well or 



better by the "inverse Gaussian" ||13|| , where the probability density is given by (see Figs. 
4,5), 



A(E- Bf 
2B 2 E 



(10) 



It will be noted that this has the same functional form as the distribution of first passage 



times over a distance d for an Ornstein Uhlenbeck process |12| with diffusion coefficient D and 
initial drift velocity v, in the regime of small times, if one makes the further identifications 
A = d 2 /(2D) and B = d/v. We postpone until section 4 a discussion of this result. The 
parameters for the fits to the parameters A and B are given in Table II. The estimated errors 
for each fit are also given in the table. We find that both the "diffusion constant (mobility)" 
and the "drift velocity" of the phase point along its trajectory in phase space depend on rj, 
being maximum for rj = and decreasing for positive values of rj. For 77 < they essentially 
stay the same. 

We have also considered the statistics of energy differences between successive energy 
states visited along a trajectory obeying the above dynamics. This does not necessarily mean 
that the energy differences considered here are nearest neighbors on the energy spectrum; 
rather these statistics may be considered a classical analogue of an absorption (or emission) 
spectrum. We found that the distributions were symmetric for AE negative or positive, and 
that they obeyed a stretched exponential distribution (for positive AE), 
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P(AE) ~exp[-(A£) c )] 



(11) 



The distribution of energy steps for different r] are given in Fig. 6a. The plots of 
ln(— \n(P(AE))) versus \n(AE) for rj = and rj = 8 are shown in Fig. 6b. The fits for 
other rj values are equally good. The values of the exponent c are shown in Table III, where 
it can be seen that c starts from small values for r] = —8 and seems to tend to 1 as rj becomes 
large and positive, again exhibiting a decorrelation effect as the energy landscape is probed 
with larger and larger t]. The correlation coefficients showing the goodness of fit for each r\ 
are also given in Table III. 

Finally, it is interesting to make a comparison between the energy states explored under 
the "77" -dynamics and Metropolis Monte Carlo. Since 

P(E) = n(E) exp(- 7 £) , (12) 

n(E) should become identical to P(E) in the limit of 7 — > 0. In Fig.7 we show our results 
for 11(E) with 7 = 10~ 5 , and P(E) for 77 = 0. 



IV. ORNSTEIN-UHLENBECK PROCESS AND THE WIGNER DISTRIBUTION 

We would now like to show that the energy obeys a stochastic process which can be 
modelled by Fokker-Planck equations with the Wigner @ or the inverse Gaussian (|10D 
forms as stationary solutions. 

We remind the reader that an OU process describes the diffusive motion of a particle 
subject to a drift velocity proportional to the distance from the origin [O. It can easily be 
seen that such a process for a single particle in one dimension would be described by the 
Langevin equation, 

doc 1 , , , 

¥ = -£** + {<*) (13) 

with a Hookean force F(x) = —gx and a delta correlated random force £(£), ((£,(t)) 2 ) = o 2 . 
Since there is no explicit time dependence of E, we have 

^-V— — (141 

dt ~ y dn ' dt ' [ } 

Substituting from (0) we get, 

dE 1 _ (dE\ 2 ^dE _ . 

From (Q) we may compute that 

^(dE\ 2 NE „ 

2^ ~- = -7— + 2^ c ifc c jfc( r i - r ^) • ( r i - r fc) • (i 6 ) 

We see that the second term is like an average of the products (rj — r^) ■ (r^- — r^) over 
(i,j) pairs (i 7^ j), and for a reasonably isotropic configuration, it vanishes. To the same 
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approximation, we may assume that the second term in Eq. (|T5|) is itself equal to a Gaussian 
stochastic noise, i.e., set = K Qj( r i — r j) • £,i(t) ■ This yields the required result, 

namely, 

dE NE 

IT = — + ' < 17 > 

This stochastic equation is equivalent [f24| to the Fokker-Planck equation 



dP(E,t) d_ 
dt ~ ~dE 



dE y ' ' dE 



(18) 



for the probability distribution of E, where D = 2{£%) and $(E) = f®(N/( r )xdx = bE 2 /2, 
with b = N/C r . The constraints we have placed on our configurational degrees of freedom 
(see Eq.(|~|)ff.) require that there be some minimum value of the energy where the probability 
current vanishes, implying reflecting boundary conditions there, as well as at some E m3jX , 
which we may take to oo for all practical purposes. To mimick these constraints we introduce 
an infinitely high potential barrier at Eq, while at the same time shifting the point of 
equilibrium of the Hookean "force" to this point. A convenient choice for a singular potential 
to add to $, is — \n(E — Eq). These reflecting boundary conditions at E and at oo then 
lead to a stationary solution P(E), 

P(E) = ae- $(B) , (19) 

where $(E) = <f> - ln(E - E ), or, 

$(£) = -b(E - Eq) 2 - ln(£ - Eq) (20) 

and a is a normalization constant. Substituting (|20| ) in (|T9"| ) leads to the Wigner formula (0). 

A stationary distribution of the inverse Gaussian form may be obtained if we modify the 
quadratic potential $ in a different way to model the constraints in the system, viz., 

W = ^(£-£) 2 + ^ln£ • (21) 

This also leads to reflecting boundary conditions, at E — and E —>■ oo, and a point of 
equilibrium at E = B. As the stationary solution we obtain the inverse Gaussian distribu- 
tion )PH|), as can be seen from direct substitution into (|T9|). 

The distribution of first passage times for the attainment of the optimum solution in 
such diverse high dimensional optimization problems as fits to X-ray patterns, travelling 
salesman problems and determination of the lowest energy state for lattice models of protein 
configurations, have been reported by Li and coworkers [|T3| , to obey the Ornstein-Uhlenbeck 
form. The plots of these distributions all display a striking similarity to each other, and to 
the distribution of energy states which we have found in the present problem. Now we see 
that if an optimization problem has a quadratic cost function C which, in terms of the large 
number of variational parameters in a reasonably isotropic phase space, has the same form 
as our energy Eq.(|3|), then the optimization algorithm defines a dynamics for C which may 
be described by means of an OU process as in Eq. ([l3|) , with a repulsive barrier at C min and 
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at oo. This may be modelled by the same Fokker Planck equation fll8|) , and potentials <3>, 
as we have discussed above. 

Recall that for an OU process, with an initial displacement x(0) = d, the solution for 
the distribution of first passage times t through the origin is given by JO] , 



2yd ( p 



3/2 



/W = ^^j r^y, (22) 

where p = g/( and y = exp(— pi). We see that (p^ ) goes over, in the limit of large times, 
i.e. y C 1, to 



/w 0/ = -^"2/ e-V" . 23 



On the other hand, for very small times, fl2"2 ) becomes, to leading order, 

„ . , 2nda 2 (d-vt) 2 

= (W^^ (24) 



where we have defined pd = v. It should be noted that these functions (|23| and |24j) have 
the same form, as functions of y and t, respectively, as the Wigner and inverse Gaussian 
distributions which we have found above, and numerically are very similar to each other. 



V. DISCUSSION 

The experimentally reported density of vibrational energy states and UV absorption 
spectra for a large number of real proteins fall on a universal curve [|T|,|22|1, which we have 
been able to fit extremely well by distributions of the Wigner or inverse Gaussian form. 
Our simple coarse-grained model for the hydrophobic interactions reproduces very closely 
the same distribution of energy states. Moreover, this same curve appears to describe the 
distribution of first passage times in rather general global optimization problems |TB[. We 
have provided an explanation for this fact on the basis of the Hookean model of hydrophobic 
interactions. 

We would like to make a few concluding remarks about about a remaining puzzle, which is 
the connection between the Wigner distribution, which arises as the distribution of eigenvalue 
spacings for Gaussian orthagonal matrices [|3|- |T0||2~5| , and the present limiting form we have 
found for the distribution of energy states. The puzzle promises to be fruitful, because we 
believe that the current state of the art of understanding the energy landscapes of proteins is 
very similar to that of the study of the energy spectra of very large nuclei in the '50s, when it 
was realized that the problem might be very advantageously treated as a statistical one. For 
real, large nuclei, the level spacing distribution, properly scaled with the average density of 
states for different energies, exhibit a remarkable invariance over the entire energy range ||. 
It was then found that the energy levels and nearest-neighbor spacings are governed to a large 
extent by the statistics of eigenvalues and eigenvalue spacings of orthogonal matrices with 
a Gaussian distribution of matrix elements [|S|- |iT||2"5|j . The statistics of nuclear energy level 



spacings approximately obey the so called Wigner "surmise" (g). This is the distribution 



S 



of eigenvalue spacings for 2x2 Gaussian orthogonal matrices which can be also obtained 
by considering the square root of the sum of the squares of two independent but identically 
distributed Gaussian random variables of zero mean. 

It is of great interest to note that the so-called Wigner distribution arises also in "quan- 



tum chaos" [pq-|2q| and in the energy spectra of large atomic clusters |29j . Thus, there seems 
to be a universality to the spectral fluctuations of confined systems of sufficient complex- 
ity @p 

It is very intrigueging to compare the results for r] > (Fig. (2)) with the numerically 
obtained nth neighbor spacing distributions of the eigenvalues for Gaussian orthagonal ma- 



trices, as reported by Porter flllfl , where the identical shift of the peak and tendency to a 
symmetric Gaussian distribution is found. This we interpret as reinforcing our observation 
that larger r\ dynamics results in a more coarse-grained sampling of the energy landscape. 
Nevertheless, the connection to n'th neighbor spacing distributions of eigenvalues of random 
matrices and the Ornstein-Uhlenbeck process still remains to be understood. 
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TABLE CAPTIONS 



Table la The parameters a, b and Eq used for fitting the energy histograms to the Wigner 
distribution f w (E) = a(E - E )e- b ^ E - Eo ^ ._ 

Table lb The mean a and the variance b used for fitting the energy histograms to the 
Gaussian distribution. 

Table II The parameters A, B used for fitting the energy histograms to the inverse Gaus- 
sian distribution (see Eq.flTUP). The estimated errors are also provided. (Calculated using 
Levenberg-Marquart algorithm) 

Table III The parameter c used for fitting the distribution of energy steps to a stretched 
exponential in the form P(AE) ~ exp[— (AE) C ]. The correlation coefficients (r 2 ) are also 
provided. 
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FIGURE CAPTIONS 



Fig. 1 A chain of N = 48 residues, half of which are randomly chosen to be hydrophobic, 
(darker beads) shown in a random initial configuration. (Generated using RasMol V2.6) 
Fig. 2 The normalized energy histograms, averaged over 10 random initial states for chains 
of N = 48, for different rj > 0, along paths of 10 4 steps, with the 

first 5000 steps discarded. The fits are to the Wigner distribution for rj = 0, 1, 3 and 
Gaussian distribution for rj = 8. 

Fig. 3 The normalized energy histograms, for chains of iV = 48, for different rj < (see 
Fig. 2). The fits are to the Wigner distribution. 

Fig. 4 The normalized energy histograms along trajectories in phase space for the N = 48 
chain, for 77 > as in Fig.2, fitted with the "inverse Gaussian" distribution given in Eq. (|iT)|). 
Fig. 5 Energy histograms for 77 < as in Fig. 4, fitted with the "inverse Gaussian" distri- 
bution given in Eq. (|10"D , for the N = 48 chain. 

Fig. 6a The distribution of energy steps along a trajectory in phase space according to the 
77- dynamics of the N = 48 chain, for different 77. The last 5000 steps along a 10000 steps 
trajectory were considered. 

Fig. 6b The fits, for rj = (left) and for 7/ = 8 (right) to the stretched exponential form 
~ exp(—AE c ), for c = 0.58 and c = 0.81 respectively, in the large AE limit. 
Fig. 7 Energy histogram for rj = and the density of states, n(E), for 7 = 10~ 5 obtained 
for rj- dynamics and the Metropolis Monte Carlo respectively. 
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TABLES 



Table la 



V 


a (10 4 ) 


b (10 7 ) 


E 


-6 


1.50 


15.0 


420 


-4 


1.50 


15.0 


380 


-2 


2.00 


15.0 


350 





1.25 


8.7 


300 


1 


0.40 


2.0 


950 


3 


0.37 


1.2 


1300 
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Table lb 



V 


a 


b (10 6 ) 


6 


4300 


2.2 


8 


4800 


2.7 
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Table II 



V 


A (xlO 3 ) 


AA(xlO) 


B (XlO 3 ) 


AB(xlO) 


-6 


7.4 


48.0 


1.2 


2.3 


-4 


6.8 


10.0 


1.2 


0.5 


-2 


6.6 


6.4 


1.1 


0.3 





6.3 


4.9 


1.4 


0.4 


1 


18.4 


7.9 


3.1 


0.8 


3 


28.3 


32.2 


4.0 


1.2 


6 


33.7 


39.5 


4.6 


1.4 


8 


38.3 


59.2 


5.2 


2.0 
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Table III 



77 


c 


r 2 (corr.coef.) 


-8 


0.50 


0.89 


-6 


0.49 


0.97 


-4 


0.54 


0.98 


-2 


0.54 


0.98 





0.58 


0.97 


1 


0.74 


0.95 


2 


0.73 


0.96 


3 


0.81 


0.95 


4 


0.73 


0.96 


6 


0.85 


0.95 


8 


0.81 


0.95 
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